The opioid epidemic is a serious public health problem in the United States. The National Vital Statistics System provides data made available by the CDC and U.S. Department of Health regarding national mortality for drug overdoses over the past several years. The goal of this project is to perform statistical analyses to learn about the relationships between variables in the data set, geospatial analyses to understand the critical centers of the epidemic, as well as forcasting to predict future overdose trajectories.
Over the past 20 years, the United States has experienced an increase in the amount of deaths that can be attributed to drug overdoses. In particular, opioids have been at the heart of this problem, including both prescription medications and illegal opioids. Opioids are used to treat pain by interacting with opioid receptors in the body and brain. In addition to pain relief, opioids also elicit a feeling of pleasure. This combination of pain relief and pleasure provides some explanation for the addictive quality of opioid drugs. Over-prescribing by clinicians and access to illicit drugs have allowed this to become a severe epidemic that has shown little indication of slowing down, that both local and national agencies are attempting to address.
To understand the nature of the problem of drug overdose mortality, an interdisciplinary approach can provide the most insight. Firstly this is a public health issue, requiring the knowledge of clinicians, psychiatrists, and public health officials. This is also an economic and geographic issue, as socioeconomic status and location may have a large affect on drug overdose mortality. Using additional data sets from SAMHSA, NIDA and the DEA to integrate with the drug overdose mortality data will provide more insight for correlative analysis and trends regarding the opioid epidemic.
The primary data set used for this project was obtained from the publicly available data from the National Vital Statistics System. The link to the data download is here. The data analyzed is titled “VSRR Provisional Drug Overdose Death Counts” and contains provisional monthly counts for drug overdose deaths and total number of deaths (from mortality data) for all 50 states in the United States over the years 2015-2017. The data also includes information on the specific drug categories determined to be the cause of overdose for 18 states.
The main goal is to analyze the data with some exploratory analysis as well as graphical analysis to visualize data in a meaningful way to help understand the scale and severity of the opioid epidemic in the United States.
First, all necessary packages are loaded first to keep the code organized.
# load necessary packages
library(tidyverse)
library(sf)
library(RColorBrewer)
library(tigris)
library(tmap)
library(tmaptools)
library(grid)
library(leaflet)
library(forecast)
library(ggthemes)Next, data needs to be loaded and cleaned for analysis. The dplyr package from tidyverse will be used to organize the data. First, all of the necessary variables will be created to perform some geospatial analysis to look at the total number of drug overdoses by state for the years 2015, 2016, and 2017.
# load in data
data <- read.csv("VSRR_drug_od.csv", header=TRUE)
# need for subsequent analysis
drop.cols <- "State.Name"
# total deaths by state for each year
totalDeathsbyState <- data %>%
select(-one_of(drop.cols)) %>%
group_by(State, Year, Indicator) %>%
filter(Year %in% c("2015", "2016", "2017"),
Indicator == "Number of Deaths",
State != "US",
State != "YC") %>%
summarize(TotalDeath = sum(Data.Value))
drugODbyState <- data %>%
select(-one_of(drop.cols)) %>%
group_by(State, Year, Indicator) %>%
filter(Year %in% c("2015", "2016", "2017"),
Indicator == "Number of Drug Overdose Deaths",
State != "US",
State != "YC") %>%
summarize(TotalODDeath = sum(Data.Value))After grouping and filtering data, the data can be joined to make it easier for subsequent analysis. The data can then be summarized to show some basic statistics for the data values. Additionally, the percentage of overdose deaths per state per year can be calculated and added as a new column.
totalDeathsandOD <- inner_join(drugODbyState, totalDeathsbyState, by=c("State", "Year"))
summary(totalDeathsandOD)## State Year Indicator.x TotalODDeath Indicator.y TotalDeath
## AK : 3 Min. :2015 Number of Drug Overdose Deaths :153 Min. : 727 Number of Deaths :153 Min. : 50086
## AL : 3 1st Qu.:2015 Cocaine (T40.5) : 0 1st Qu.: 3866 Cocaine (T40.5) : 0 1st Qu.: 172396
## AR : 3 Median :2016 Heroin (T40.1) : 0 Median : 9526 Heroin (T40.1) : 0 Median : 461471
## AZ : 3 Mean :2016 Methadone (T40.3) : 0 Mean :13598 Methadone (T40.3) : 0 Mean : 631255
## CA : 3 3rd Qu.:2017 Natural & semi-synthetic opioids (T40.2): 0 3rd Qu.:17540 Natural & semi-synthetic opioids (T40.2): 0 3rd Qu.: 775307
## CO : 3 Max. :2017 Number of Deaths : 0 Max. :65422 Number of Drug Overdose Deaths : 0 Max. :3202079
## (Other):135 (Other) : 0 (Other) : 0
percentOD <- totalDeathsandOD %>%
mutate(percent = ((TotalODDeath / TotalDeath) * 100))A histogram can be used to get an idea of the distribution of the overdose percentages for the years 2015-2017.
ggplot(data=percentOD, aes(x=percent)) +
geom_histogram(bins=20, aes(y=..density..), colour="black", fill="white") +
ggtitle("Overdose Death Percentage 2015-2017") +
geom_density(alpha=.2, fill="lightblue")Because geographical analysis is a goal, loading in the coordinate data is necessary. Our map shapefile was downloaded from the US Census Bureau. The package sf is used to import the shapefile as an sf object.
# importing state shapefile as sf object
US.sf <- st_read("cb_2017_us_state_20m/cb_2017_us_state_20m.shp")## Reading layer `cb_2017_us_state_20m' from data source `/Users/kateslovik/BMIN503_Final_Project/cb_2017_us_state_20m/cb_2017_us_state_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 52 features and 9 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## epsg (SRID): 4269
## proj4string: +proj=longlat +datum=NAD83 +no_defs
str(US.sf)## Classes 'sf' and 'data.frame': 52 obs. of 10 variables:
## $ STATEFP : Factor w/ 52 levels "01","02","04",..: 2 5 6 9 13 14 16 18 19 21 ...
## $ STATENS : Factor w/ 52 levels "00068085","00294478",..: 51 22 23 17 27 28 29 30 16 19 ...
## $ AFFGEOID: Factor w/ 52 levels "0400000US01",..: 2 5 6 9 13 14 16 18 19 21 ...
## $ GEOID : Factor w/ 52 levels "01","02","04",..: 2 5 6 9 13 14 16 18 19 21 ...
## $ STUSPS : Factor w/ 52 levels "AK","AL","AR",..: 1 5 6 8 14 15 13 18 19 21 ...
## $ NAME : Factor w/ 52 levels "Alabama","Alaska",..: 2 5 6 9 13 14 16 18 19 21 ...
## $ LSAD : Factor w/ 1 level "00": 1 1 1 1 1 1 1 1 1 1 ...
## $ ALAND : num 1.48e+12 4.03e+11 2.68e+11 1.58e+08 2.14e+11 ...
## $ AWATER : num 2.78e+11 2.05e+10 1.18e+09 1.87e+07 2.39e+09 ...
## $ geometry:sfc_MULTIPOLYGON of length 52; first list element: List of 47
## ..$ :List of 1
## .. ..$ : num [1:13, 1:2] -173 -173 -173 -173 -172 ...
## ..$ :List of 1
## .. ..$ : num [1:37, 1:2] -175 -175 -175 -175 -175 ...
## ..$ :List of 1
## .. ..$ : num [1:39, 1:2] -177 -177 -177 -177 -177 ...
## ..$ :List of 1
## .. ..$ : num [1:17, 1:2] -178 -177 -177 -177 -177 ...
## ..$ :List of 1
## .. ..$ : num [1:19, 1:2] -178 -178 -178 -178 -178 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -179 -179 -179 -179 -179 ...
## ..$ :List of 1
## .. ..$ : num [1:6, 1:2] -179 -179 -179 -179 -179 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -179 -179 -179 -179 -179 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] 179 180 180 180 180 ...
## ..$ :List of 1
## .. ..$ : num [1:14, 1:2] 179 179 179 179 179 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] 178 179 179 179 179 ...
## ..$ :List of 1
## .. ..$ : num [1:6, 1:2] 178 178 178 178 178 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] 178 178 178 178 178 ...
## ..$ :List of 1
## .. ..$ : num [1:20, 1:2] 177 177 177 177 178 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] 174 174 174 174 174 ...
## ..$ :List of 1
## .. ..$ : num [1:12, 1:2] 173 174 174 174 174 ...
## ..$ :List of 1
## .. ..$ : num [1:20, 1:2] 172 173 173 173 173 ...
## ..$ :List of 1
## .. ..$ : num [1:77, 1:2] -134 -134 -134 -134 -134 ...
## ..$ :List of 1
## .. ..$ : num [1:63, 1:2] -134 -134 -134 -134 -134 ...
## ..$ :List of 1
## .. ..$ : num [1:45, 1:2] -135 -135 -135 -135 -135 ...
## ..$ :List of 1
## .. ..$ : num [1:78, 1:2] -137 -137 -136 -136 -136 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -152 -152 -152 -152 -152 ...
## ..$ :List of 1
## .. ..$ : num [1:8, 1:2] -154 -153 -153 -153 -153 ...
## ..$ :List of 1
## .. ..$ : num [1:121, 1:2] -155 -155 -155 -155 -154 ...
## ..$ :List of 1
## .. ..$ : num [1:14, 1:2] -155 -155 -155 -154 -154 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] -156 -156 -156 -156 -156 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -157 -157 -157 -157 -157 ...
## ..$ :List of 1
## .. ..$ : num [1:8, 1:2] -157 -157 -157 -157 -157 ...
## ..$ :List of 1
## .. ..$ : num [1:24, 1:2] -160 -160 -160 -160 -160 ...
## ..$ :List of 1
## .. ..$ : num [1:19, 1:2] -161 -161 -161 -161 -160 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] -161 -161 -161 -161 -161 ...
## ..$ :List of 1
## .. ..$ : num [1:6, 1:2] -161 -161 -161 -161 -161 ...
## ..$ :List of 1
## .. ..$ : num [1:5, 1:2] -162 -162 -162 -162 -162 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] -163 -163 -162 -162 -162 ...
## ..$ :List of 1
## .. ..$ : num [1:14, 1:2] -166 -165 -165 -165 -165 ...
## ..$ :List of 1
## .. ..$ : num [1:13, 1:2] -166 -166 -166 -166 -165 ...
## ..$ :List of 1
## .. ..$ : num [1:44, 1:2] -167 -167 -167 -167 -167 ...
## ..$ :List of 1
## .. ..$ : num [1:58, 1:2] -168 -168 -168 -167 -167 ...
## ..$ :List of 1
## .. ..$ : num [1:1244, 1:2] -168 -168 -168 -167 -167 ...
## ..$ :List of 1
## .. ..$ : num [1:33, 1:2] -169 -169 -169 -169 -169 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -170 -170 -169 -169 -170 ...
## ..$ :List of 1
## .. ..$ : num [1:16, 1:2] -170 -170 -170 -170 -170 ...
## ..$ :List of 1
## .. ..$ : num [1:7, 1:2] -170 -170 -170 -170 -170 ...
## ..$ :List of 1
## .. ..$ : num [1:8, 1:2] -171 -171 -171 -171 -171 ...
## ..$ :List of 1
## .. ..$ : num [1:6, 1:2] -171 -171 -171 -171 -171 ...
## ..$ :List of 1
## .. ..$ : num [1:54, 1:2] -172 -172 -172 -172 -172 ...
## ..$ :List of 1
## .. ..$ : num [1:9, 1:2] -173 -173 -172 -172 -172 ...
## ..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr "STATEFP" "STATENS" "AFFGEOID" "GEOID" ...
US.sf <- US.sf %>%
select(STATENS, AFFGEOID, GEOID, STUSPS, NAME, LSAD, ALAND, AWATER, geometry) %>%
rename(State='STUSPS')The geographical data can then be joined to the drug overdose data.
# join by state in order to make map plots
USdeaths <- inner_join(US.sf, drugODbyState, by = "State")More data manipulation will be employed in the Results section as part of several subsequent analyses.
Describe your results and include relevant tables, plots, and code/comments used to obtain them. End with a brief conclusion of your findings related to the question you set out to address. You can include references if you’d like, but this is not required.
Now, we can visualize the total number of overdose deaths across the entire United States by implementing the package tmap. Using a for loop to iterate over the 3 years of interest, a static map for each year can be generated and saved.
# setting up for plotting static maps
for (year in list("2015", "2016", "2017")){
US_cont <- USdeaths %>%
filter(Year == year) %>%
subset(!GEOID %in% c("02", "15", "72")) %>%
simplify_shape(0.2)
US_AK <- USdeaths %>%
filter(Year == year) %>%
subset(GEOID == "02") %>%
simplify_shape(0.2)
US_HI <- USdeaths %>%
filter(Year == year) %>%
subset(GEOID == "15") %>%
simplify_shape(0.2)
US_states <- US_cont %>%
dplyr::select(geometry) %>%
aggregate(by = list(US_cont$State), FUN = mean)
# Contiguous US map
m_cont <- tm_shape(US_cont, projection = 2163) +
tm_polygons("TotalODDeath", title = "Number of Deaths", showNA = FALSE,
border.col = "white", border.alpha = .5,
style = "fixed",
labels = c("0 to 1k", "1k to 2.5k",
"2.5k to 5k", "5k to 10k",
"10k to 15k", "15k to 25k",
"25k to 50k", "50k to 100k"),
breaks = c(0, 1000, 2500, 5000, 10000, 15000, 25000, 50000, 100000),
palette = "YlOrRd") +
tm_shape(US_states) +
tm_borders(lwd=1, col = "black", alpha = .5) +
tm_layout(title = paste("Total Number of Overdose Deaths in the US,", year),
title.size = 1,
title.position = c("center", "top"),
legend.position = c("right", "bottom"),
legend.width = 0.5,
legend.title.size = 0.8,
legend.text.size = 0.5,
legend.outside = FALSE,
frame = FALSE,
inner.margins = c(0.1, 0.1, 0.1, 0.1))
# Alaska map
m_AK <- tm_shape(US_AK, projection = 3338) +
tm_polygons("TotalODDeath",
border.col = "white",
border.alpha = .5,
style = "fixed",
labels = c("0 to 1k", "1k to 2.5k",
"2.5k to 5k", "5k to 10k",
"10k to 15k", "15k to 25k",
"25k to 50k", "50k to 100k"),
breaks = c(0, 1000, 2500, 5000, 10000, 15000, 25000, 50000, 100000),
palette = "YlOrRd") +
tm_layout("Alaska",
legend.show = FALSE,
bg.color = NA,
title.size = 0.8,
frame = FALSE)
# Hawaii map
m_HI <- tm_shape(US_HI, projection = 3759) +
tm_polygons("TotalODDeath",
border.col = "white",
border.alpha = .5,
style = "fixed",
labels = c("0 to 1k", "1k to 2.5k",
"2.5k to 5k", "5k to 10k",
"10k to 15k", "15k to 25k",
"25k to 50k", "50k to 100k"),
breaks = c(0, 1000, 2500, 5000, 10000, 15000, 25000, 50000, 100000),
palette = "YlOrRd") +
tm_layout("Hawaii",
legend.show = FALSE,
bg.color = NA,
title.position = c("LEFT", "BOTTOM"),
title.size = 0.8,
frame = FALSE)
# Use grid package to set where AK and HI map should be plotted on the contiguous map
vp_AK <- viewport(x = 0.15, y = 0.15, width = 0.3, height = 0.3)
vp_HI <- viewport(x = 0.4, y = 0.1, width = 0.2, height = 0.1)
# plot map to save individual figures
tmap_save(m_cont,
insets_tm = list(m_AK, m_HI),
insets_vp = list(vp_AK, vp_HI),
filename=paste("ODinUS", year, ".png", sep=""))
graphics.off() # clears after each plot
}The exported .png maps can be re-imported and printed for viewing:
ADD DESCRIPTION OF RESULTS FROM MAPS HERE
Next, the percentage of deaths attributed to drug overdoses was calculated for each state for the years 2015, 2016 and 2017. The data for each year can be plotted as a interactive choropleth map using the leaflet package.
## Leaflet Maps
# OD percentage join with shapefile
percentODinUS <- inner_join(US.sf, percentOD, by = "State")
ODmap2015 <- percentODinUS %>% filter(Year == "2015")
ODmap2016 <- percentODinUS %>% filter(Year == "2016")
ODmap2017 <- percentODinUS %>% filter(Year == "2017")
# favorite color palette
pal_fun <- colorNumeric("PuRd", NULL)
# 2015
# popup message
pu_message2015 <- paste0(ODmap2015$NAME,
"<br>Drug Overdose Rate: ",
round(ODmap2015$percent, 1), "%")
# leaflet map for OD percentage in 2015
leaflet(ODmap2015) %>%
addPolygons(stroke = TRUE, weight=1, color="white",
fillColor = ~pal_fun(percent),
fillOpacity = 1,
popup = pu_message2015) %>%
setView(lat = 39.8283, lng = -98.5795, zoom = 3) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addLegend("bottomright",
pal=pal_fun,
values=~percent,
title = "Drug Overdose Rate, 2015 (%)",
opacity = 1) %>%
addScaleBar()# 2016
# popup message
pu_message2016 <- paste0(ODmap2016$NAME,
"<br>Drug Overdose Rate: ",
round(ODmap2016$percent, 1), "%")
# leaflet map for OD percentage in 2016
leaflet(ODmap2016) %>%
addPolygons(stroke = TRUE, weight=1, color="white",
fillColor = ~pal_fun(percent),
fillOpacity = 1,
popup = pu_message2016) %>%
setView(lat = 39.8283, lng = -98.5795, zoom = 3) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addLegend("bottomright",
pal=pal_fun,
values=~percent,
title = "Drug Overdose Rate, 2016 (%)",
opacity = 1) %>%
addScaleBar()# 2017
# popup message
pu_message2017 <- paste0(ODmap2017$NAME,
"<br>Drug Overdose Rate: ",
round(ODmap2017$percent, 1), "%")
# leaflet map for OD percentage in 2017
leaflet(ODmap2017) %>%
addPolygons(stroke = TRUE, weight=1, color="white",
fillColor = ~pal_fun(percent),
fillOpacity = 1,
popup = pu_message2017) %>%
setView(lat = 39.8283, lng = -98.5795, zoom = 3) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addLegend("bottomright",
pal=pal_fun,
values=~percent,
title = "Drug Overdose Rate, 2017 (%)",
opacity = 1) %>%
addScaleBar()ADD DESCRIPTION OF RESULTS HERE
Want to implement a leaflet map to show percent change from 2015 to 2017… need to play with the data though